Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells337964
Missing cells (%)5.0%
Duplicate rows291922
Duplicate rows (%)86.2%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Categorical2
Numeric18

Warnings

Dataset has 291922 (86.2%) duplicate rows Duplicates
SanchiName has a high cardinality: 77 distinct values High cardinality
RuikeiHonsyoHeiti is highly correlated with RuikeiSyutokuHeichiHigh correlation
RuikeiHonsyoSyogai is highly correlated with RuikeiFukaSyogai and 1 other fieldsHigh correlation
RuikeiFukaSyogai is highly correlated with RuikeiHonsyoSyogai and 1 other fieldsHigh correlation
RuikeiSyutokuHeichi is highly correlated with RuikeiHonsyoHeitiHigh correlation
RuikeiSyutokuSyogai is highly correlated with RuikeiHonsyoSyogai and 1 other fieldsHigh correlation
SogoChakukaisu2 is highly correlated with ChuoChakukaisu2High correlation
SogoChakukaisu3 is highly correlated with ChuoChakukaisu3High correlation
SogoChakukaisu4 is highly correlated with ChuoChakukaisu4High correlation
SogoChakukaisu5 is highly correlated with ChuoChakukaisu5High correlation
ChuoChakukaisu2 is highly correlated with SogoChakukaisu2High correlation
ChuoChakukaisu3 is highly correlated with SogoChakukaisu3High correlation
ChuoChakukaisu4 is highly correlated with SogoChakukaisu4High correlation
ChuoChakukaisu5 is highly correlated with SogoChakukaisu5High correlation
Syotai has 337964 (99.8%) missing values Missing
RuikeiHonsyoSyogai is highly skewed (γ1 = 27.14302994) Skewed
RuikeiFukaHeichi is highly skewed (γ1 = 23.30715518) Skewed
RuikeiFukaSyogai is highly skewed (γ1 = 32.1404276) Skewed
RuikeiSyutokuSyogai is highly skewed (γ1 = 38.92981855) Skewed
RuikeiHonsyoHeiti has 57106 (16.9%) zeros Zeros
RuikeiHonsyoSyogai has 315907 (93.3%) zeros Zeros
RuikeiFukaHeichi has 216858 (64.0%) zeros Zeros
RuikeiFukaSyogai has 334777 (98.9%) zeros Zeros
RuikeiSyutokuHeichi has 110114 (32.5%) zeros Zeros
RuikeiSyutokuSyogai has 324460 (95.8%) zeros Zeros
SogoChakukaisu1 has 92437 (27.3%) zeros Zeros
SogoChakukaisu2 has 111847 (33.0%) zeros Zeros
SogoChakukaisu3 has 104875 (31.0%) zeros Zeros
SogoChakukaisu4 has 105235 (31.1%) zeros Zeros
SogoChakukaisu5 has 104212 (30.8%) zeros Zeros
ChuoChakukaisu1 has 124139 (36.7%) zeros Zeros
ChuoChakukaisu2 has 143149 (42.3%) zeros Zeros
ChuoChakukaisu3 has 133347 (39.4%) zeros Zeros
ChuoChakukaisu4 has 130810 (38.6%) zeros Zeros
ChuoChakukaisu5 has 127972 (37.8%) zeros Zeros

Reproduction

Analysis started2021-04-07 13:10:54.516711
Analysis finished2021-04-07 13:12:53.243340
Duration1 minute and 58.73 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

Syotai
Categorical

MISSING

Distinct23
Distinct (%)3.7%
Missing337964
Missing (%)99.8%
Memory size2.6 MiB
笠松
142 
愛知
62 
佐賀
62 
船橋
61 
兵庫
58 
Other values (18)
243 

Length

Max length7
Median length2
Mean length2.24044586
Min length2

Characters and Unicode

Total characters1407
Distinct characters50
Distinct categories2 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st row北海道
2nd row北海道
3rd row北海道
4th row北海道
5th row北海道
ValueCountFrequency (%)
笠松142
 
< 0.1%
愛知62
 
< 0.1%
佐賀62
 
< 0.1%
船橋61
 
< 0.1%
兵庫58
 
< 0.1%
大井54
 
< 0.1%
北海道32
 
< 0.1%
香港28
 
< 0.1%
川崎27
 
< 0.1%
浦和19
 
< 0.1%
Other values (13)83
 
< 0.1%
(Missing)337964
99.8%
Histogram of lengths of the category
ValueCountFrequency (%)
笠松142
22.6%
佐賀62
9.9%
愛知62
9.9%
船橋61
9.7%
兵庫58
9.2%
大井54
 
8.6%
北海道32
 
5.1%
香港28
 
4.5%
川崎27
 
4.3%
浦和19
 
3.0%
Other values (13)83
13.2%

Most occurring characters

ValueCountFrequency (%)
142
 
10.1%
142
 
10.1%
67
 
4.8%
62
 
4.4%
62
 
4.4%
62
 
4.4%
61
 
4.3%
61
 
4.3%
58
 
4.1%
58
 
4.1%
Other values (40)632
44.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter1403
99.7%
Modifier Letter4
 
0.3%

Most frequent character per category

ValueCountFrequency (%)
142
 
10.1%
142
 
10.1%
67
 
4.8%
62
 
4.4%
62
 
4.4%
62
 
4.4%
61
 
4.3%
61
 
4.3%
58
 
4.1%
58
 
4.1%
Other values (39)628
44.8%
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han1180
83.9%
Katakana223
 
15.8%
Common4
 
0.3%

Most frequent character per script

ValueCountFrequency (%)
142
 
12.0%
142
 
12.0%
67
 
5.7%
62
 
5.3%
62
 
5.3%
62
 
5.3%
61
 
5.2%
61
 
5.2%
58
 
4.9%
58
 
4.9%
Other values (18)405
34.3%
ValueCountFrequency (%)
30
13.5%
28
12.6%
23
10.3%
23
10.3%
20
9.0%
15
6.7%
14
 
6.3%
13
 
5.8%
13
 
5.8%
8
 
3.6%
Other values (11)36
16.1%
ValueCountFrequency (%)
4
100.0%

Most occurring blocks

ValueCountFrequency (%)
CJK1180
83.9%
Katakana227
 
16.1%

Most frequent character per block

ValueCountFrequency (%)
142
 
12.0%
142
 
12.0%
67
 
5.7%
62
 
5.3%
62
 
5.3%
62
 
5.3%
61
 
5.2%
61
 
5.2%
58
 
4.9%
58
 
4.9%
Other values (18)405
34.3%
ValueCountFrequency (%)
30
13.2%
28
12.3%
23
10.1%
23
10.1%
20
8.8%
15
 
6.6%
14
 
6.2%
13
 
5.7%
13
 
5.7%
8
 
3.5%
Other values (12)40
17.6%

BreederCode
Real number (ℝ≥0)

Distinct2711
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean467007.2923
Minimum0
Maximum993201
Zeros82
Zeros (%)< 0.1%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile33305
Q1301513
median400015
Q3710303
95-th percentile910369
Maximum993201
Range993201
Interquartile range (IQR)408790

Descriptive statistics

Standard deviation265687.4571
Coefficient of variation (CV)0.5689150071
Kurtosis-0.9659484301
Mean467007.2923
Median Absolute Deviation (MAD)196710
Skewness0.1675955592
Sum1.581249331 × 1011
Variance7.058982484 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37312628915
 
8.5%
39312625961
 
7.7%
3411269029
 
2.7%
9131245033
 
1.5%
7103034691
 
1.4%
4331294508
 
1.3%
8115404386
 
1.3%
4000183713
 
1.1%
2330713458
 
1.0%
3015133130
 
0.9%
Other values (2701)245768
72.6%
ValueCountFrequency (%)
082
 
< 0.1%
142156
0.6%
32237
 
0.1%
426
 
< 0.1%
49283
 
0.1%
ValueCountFrequency (%)
99320119
< 0.1%
9908876
 
< 0.1%
9908861
 
< 0.1%
9908831
 
< 0.1%
9908823
 
< 0.1%

SanchiName
Categorical

HIGH CARDINALITY

Distinct77
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
新ひだか町
67675 
浦河町
61365 
新冠町
51425 
日高町
48728 
安平町
32508 
Other values (72)
76891 

Length

Max length5
Median length3
Mean length3.358531802
Min length1

Characters and Unicode

Total characters1137172
Distinct characters93
Distinct categories1 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st row新冠町
2nd row新冠町
3rd row新冠町
4th row新冠町
5th row新冠町
ValueCountFrequency (%)
新ひだか町67675
20.0%
浦河町61365
18.1%
新冠町51425
15.2%
日高町48728
14.4%
安平町32508
9.6%
千歳市25958
 
7.7%
白老町7993
 
2.4%
7929
 
2.3%
平取町6241
 
1.8%
様似町3632
 
1.1%
Other values (67)25138
 
7.4%
Histogram of lengths of the category
ValueCountFrequency (%)
新ひだか町67675
20.0%
浦河町61365
18.1%
新冠町51425
15.2%
日高町48728
14.4%
安平町32508
9.6%
千歳市25958
 
7.7%
白老町7993
 
2.4%
7929
 
2.3%
平取町6241
 
1.8%
様似町3632
 
1.1%
Other values (67)25138
 
7.4%

Most occurring characters

ValueCountFrequency (%)
295750
26.0%
119386
10.5%
71385
 
6.3%
67802
 
6.0%
67802
 
6.0%
61919
 
5.4%
61468
 
5.4%
51493
 
4.5%
48827
 
4.3%
48827
 
4.3%
Other values (83)242513
21.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter1137172
100.0%

Most frequent character per category

ValueCountFrequency (%)
295750
26.0%
119386
10.5%
71385
 
6.3%
67802
 
6.0%
67802
 
6.0%
61919
 
5.4%
61468
 
5.4%
51493
 
4.5%
48827
 
4.3%
48827
 
4.3%
Other values (83)242513
21.3%

Most occurring scripts

ValueCountFrequency (%)
Han919687
80.9%
Hiragana217485
 
19.1%

Most frequent character per script

ValueCountFrequency (%)
295750
32.2%
119386
13.0%
61919
 
6.7%
61468
 
6.7%
51493
 
5.6%
48827
 
5.3%
48827
 
5.3%
38775
 
4.2%
32518
 
3.5%
27069
 
2.9%
Other values (75)133655
14.5%
ValueCountFrequency (%)
71385
32.8%
67802
31.2%
67802
31.2%
3583
 
1.6%
3583
 
1.6%
1110
 
0.5%
1110
 
0.5%
1110
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
CJK919687
80.9%
Hiragana217485
 
19.1%

Most frequent character per block

ValueCountFrequency (%)
295750
32.2%
119386
13.0%
61919
 
6.7%
61468
 
6.7%
51493
 
5.6%
48827
 
5.3%
48827
 
5.3%
38775
 
4.2%
32518
 
3.5%
27069
 
2.9%
Other values (75)133655
14.5%
ValueCountFrequency (%)
71385
32.8%
67802
31.2%
67802
31.2%
3583
 
1.6%
3583
 
1.6%
1110
 
0.5%
1110
 
0.5%
1110
 
0.5%

RuikeiHonsyoHeiti
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7081
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean334569.6802
Minimum0
Maximum18132000
Zeros57106
Zeros (%)16.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q118000
median136000
Q3429400
95-th percentile1235500
Maximum18132000
Range18132000
Interquartile range (IQR)411400

Descriptive statistics

Standard deviation603552.295
Coefficient of variation (CV)1.803965902
Kurtosis106.9730932
Mean334569.6802
Median Absolute Deviation (MAD)136000
Skewness7.114155976
Sum1.132826172 × 1011
Variance3.642753728 × 1011
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
057106
 
16.9%
50004875
 
1.4%
75003056
 
0.9%
700002595
 
0.8%
180002347
 
0.7%
130002205
 
0.7%
70002176
 
0.6%
110002138
 
0.6%
500002113
 
0.6%
200001591
 
0.5%
Other values (7071)258390
76.3%
ValueCountFrequency (%)
057106
16.9%
240013
 
< 0.1%
250021
 
< 0.1%
25507
 
< 0.1%
350013
 
< 0.1%
ValueCountFrequency (%)
1813200017
< 0.1%
1445800014
< 0.1%
130820007
< 0.1%
130580009
< 0.1%
1251900017
< 0.1%

RuikeiHonsyoSyogai
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct727
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14852.01289
Minimum0
Maximum7506000
Zeros315907
Zeros (%)93.3%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile43000
Maximum7506000
Range7506000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation124414.6578
Coefficient of variation (CV)8.376955954
Kurtosis1230.613327
Mean14852.01289
Median Absolute Deviation (MAD)0
Skewness27.14302994
Sum5028772750
Variance1.547900709 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0315907
93.3%
7000556
 
0.2%
70000551
 
0.2%
11000537
 
0.2%
7800518
 
0.2%
18000432
 
0.1%
78000348
 
0.1%
12000314
 
0.1%
20000242
 
0.1%
98000236
 
0.1%
Other values (717)18951
 
5.6%
ValueCountFrequency (%)
0315907
93.3%
7000556
 
0.2%
730095
 
< 0.1%
7500118
 
< 0.1%
7800518
 
0.2%
ValueCountFrequency (%)
750600026
< 0.1%
45590009
 
< 0.1%
45380005
 
< 0.1%
41670008
 
< 0.1%
40230004
 
< 0.1%

RuikeiFukaHeichi
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct2327
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4012.004684
Minimum0
Maximum1218840
Zeros216858
Zeros (%)64.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32450
95-th percentile15670
Maximum1218840
Range1218840
Interquartile range (IQR)2450

Descriptive statistics

Standard deviation22605.87382
Coefficient of variation (CV)5.634558182
Kurtosis784.0498938
Mean4012.004684
Median Absolute Deviation (MAD)0
Skewness23.30715518
Sum1358432690
Variance511025531.2
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0216858
64.0%
500603
 
0.2%
1100582
 
0.2%
1000577
 
0.2%
530540
 
0.2%
550515
 
0.2%
580513
 
0.2%
1080506
 
0.1%
520497
 
0.1%
1040472
 
0.1%
Other values (2317)116929
34.5%
ValueCountFrequency (%)
0216858
64.0%
13024
 
< 0.1%
16019
 
< 0.1%
1703
 
< 0.1%
1803
 
< 0.1%
ValueCountFrequency (%)
121884015
< 0.1%
8956707
< 0.1%
8211007
< 0.1%
8063309
< 0.1%
74310017
< 0.1%

RuikeiFukaSyogai
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct186
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.68510774
Minimum0
Maximum53370
Zeros334777
Zeros (%)98.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum53370
Range53370
Interquartile range (IQR)0

Descriptive statistics

Standard deviation929.9900656
Coefficient of variation (CV)17.00627655
Kurtosis1415.743534
Mean54.68510774
Median Absolute Deviation (MAD)0
Skewness32.1404276
Sum18515940
Variance864881.522
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0334777
98.9%
420159
 
< 0.1%
840140
 
< 0.1%
430116
 
< 0.1%
860108
 
< 0.1%
71084
 
< 0.1%
308066
 
< 0.1%
34063
 
< 0.1%
146058
 
< 0.1%
497052
 
< 0.1%
Other values (176)2969
 
0.9%
ValueCountFrequency (%)
0334777
98.9%
25018
 
< 0.1%
2905
 
< 0.1%
30022
 
< 0.1%
3101
 
< 0.1%
ValueCountFrequency (%)
5337026
< 0.1%
530009
 
< 0.1%
339609
 
< 0.1%
335705
 
< 0.1%
3179018
< 0.1%

RuikeiSyutokuHeichi
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct898
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean89162.66185
Minimum0
Maximum9068000
Zeros110114
Zeros (%)32.5%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median40000
Q395000
95-th percentile330000
Maximum9068000
Range9068000
Interquartile range (IQR)95000

Descriptive statistics

Standard deviation223941.2681
Coefficient of variation (CV)2.511603663
Kurtosis275.4020308
Mean89162.66185
Median Absolute Deviation (MAD)40000
Skewness12.21341436
Sum3.0189764 × 1010
Variance5.014969154 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0110114
32.5%
2000044496
13.1%
9500021472
 
6.3%
4000018564
 
5.5%
7000016807
 
5.0%
4500014660
 
4.3%
9000012583
 
3.7%
15500012443
 
3.7%
1350006801
 
2.0%
1500005942
 
1.8%
Other values (888)74710
22.1%
ValueCountFrequency (%)
0110114
32.5%
50045
 
< 0.1%
10001054
 
0.3%
1500840
 
0.2%
20001624
 
0.5%
ValueCountFrequency (%)
906800014
< 0.1%
706250017
< 0.1%
611300017
< 0.1%
540850015
< 0.1%
50760009
< 0.1%

RuikeiSyutokuSyogai
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct85
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4493.520225
Minimum0
Maximum3640000
Zeros324460
Zeros (%)95.8%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum3640000
Range3640000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation50848.83189
Coefficient of variation (CV)11.31603494
Kurtosis2286.829971
Mean4493.520225
Median Absolute Deviation (MAD)0
Skewness38.92981855
Sum1521470000
Variance2585603704
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0324460
95.8%
400009707
 
2.9%
1000001433
 
0.4%
160000600
 
0.2%
115000314
 
0.1%
110000149
 
< 0.1%
95000114
 
< 0.1%
17500095
 
< 0.1%
24000092
 
< 0.1%
30500071
 
< 0.1%
Other values (75)1557
 
0.5%
ValueCountFrequency (%)
0324460
95.8%
400009707
 
2.9%
95000114
 
< 0.1%
1000001433
 
0.4%
10500013
 
< 0.1%
ValueCountFrequency (%)
364000026
< 0.1%
21150005
 
< 0.1%
20700009
 
< 0.1%
18250004
 
< 0.1%
18100008
 
< 0.1%

SogoChakukaisu1
Real number (ℝ≥0)

ZEROS

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.026512735
Minimum0
Maximum26
Zeros92437
Zeros (%)27.3%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q33
95-th percentile6
Maximum26
Range26
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.970036867
Coefficient of variation (CV)0.9721315012
Kurtosis2.795971226
Mean2.026512735
Median Absolute Deviation (MAD)1
Skewness1.278915461
Sum686161
Variance3.881045259
MonotocityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
092437
27.3%
169887
20.6%
257367
16.9%
349525
14.6%
432839
 
9.7%
518273
 
5.4%
68570
 
2.5%
74728
 
1.4%
82424
 
0.7%
91033
 
0.3%
Other values (14)1509
 
0.4%
ValueCountFrequency (%)
092437
27.3%
169887
20.6%
257367
16.9%
349525
14.6%
432839
 
9.7%
ValueCountFrequency (%)
261
 
< 0.1%
223
< 0.1%
211
 
< 0.1%
201
 
< 0.1%
196
< 0.1%

SogoChakukaisu2
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.918928976
Minimum0
Maximum18
Zeros111847
Zeros (%)33.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum18
Range18
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.130110294
Coefficient of variation (CV)1.110051659
Kurtosis2.443768864
Mean1.918928976
Median Absolute Deviation (MAD)1
Skewness1.443317206
Sum649734
Variance4.537369866
MonotocityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
0111847
33.0%
170409
20.8%
252178
15.4%
337945
 
11.2%
424893
 
7.4%
516224
 
4.8%
611037
 
3.3%
76721
 
2.0%
83531
 
1.0%
91636
 
0.5%
Other values (8)2171
 
0.6%
ValueCountFrequency (%)
0111847
33.0%
170409
20.8%
252178
15.4%
337945
 
11.2%
424893
 
7.4%
ValueCountFrequency (%)
188
 
< 0.1%
172
 
< 0.1%
1547
 
< 0.1%
1461
 
< 0.1%
13183
0.1%

SogoChakukaisu3
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.844393843
Minimum0
Maximum17
Zeros104875
Zeros (%)31.0%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum17
Range17
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.963390025
Coefficient of variation (CV)1.064517773
Kurtosis2.370173615
Mean1.844393843
Median Absolute Deviation (MAD)1
Skewness1.397463253
Sum624497
Variance3.854900391
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
0104875
31.0%
176982
22.7%
257608
17.0%
339159
 
11.6%
425580
 
7.6%
514874
 
4.4%
68834
 
2.6%
75310
 
1.6%
82763
 
0.8%
91459
 
0.4%
Other values (7)1148
 
0.3%
ValueCountFrequency (%)
0104875
31.0%
176982
22.7%
257608
17.0%
339159
 
11.6%
425580
 
7.6%
ValueCountFrequency (%)
1713
 
< 0.1%
1518
 
< 0.1%
1418
 
< 0.1%
1378
< 0.1%
12180
0.1%

SogoChakukaisu4
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.777679331
Minimum0
Maximum20
Zeros105235
Zeros (%)31.1%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum20
Range20
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.902520452
Coefficient of variation (CV)1.070227019
Kurtosis3.341594241
Mean1.777679331
Median Absolute Deviation (MAD)1
Skewness1.496419203
Sum601908
Variance3.619584068
MonotocityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
0105235
31.1%
180862
23.9%
258672
17.3%
338340
 
11.3%
423939
 
7.1%
514131
 
4.2%
68726
 
2.6%
74267
 
1.3%
82213
 
0.7%
91156
 
0.3%
Other values (9)1051
 
0.3%
ValueCountFrequency (%)
0105235
31.1%
180862
23.9%
258672
17.3%
338340
 
11.3%
423939
 
7.1%
ValueCountFrequency (%)
201
 
< 0.1%
1949
< 0.1%
173
 
< 0.1%
162
 
< 0.1%
1454
< 0.1%

SogoChakukaisu5
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.708791111
Minimum0
Maximum16
Zeros104212
Zeros (%)30.8%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile5
Maximum16
Range16
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.815439766
Coefficient of variation (CV)1.062411756
Kurtosis3.096025761
Mean1.708791111
Median Absolute Deviation (MAD)1
Skewness1.49465968
Sum578583
Variance3.295821544
MonotocityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0104212
30.8%
186087
25.4%
258785
17.4%
339537
 
11.7%
422667
 
6.7%
513180
 
3.9%
66957
 
2.1%
73472
 
1.0%
81685
 
0.5%
9937
 
0.3%
Other values (6)1073
 
0.3%
ValueCountFrequency (%)
0104212
30.8%
186087
25.4%
258785
17.4%
339537
 
11.7%
422667
 
6.7%
ValueCountFrequency (%)
1612
 
< 0.1%
152
 
< 0.1%
13158
< 0.1%
1299
 
< 0.1%
11304
0.1%

SogoChakukaisu6
Real number (ℝ≥0)

Distinct65
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.81584326
Minimum0
Maximum89
Zeros2637
Zeros (%)0.8%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile2
Q16
median10
Q316
95-th percentile28
Maximum89
Range89
Interquartile range (IQR)10

Descriptive statistics

Standard deviation8.223059731
Coefficient of variation (CV)0.6959350723
Kurtosis2.980408519
Mean11.81584326
Median Absolute Deviation (MAD)5
Skewness1.340479696
Sum4000750
Variance67.61871134
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521785
 
6.4%
621232
 
6.3%
819737
 
5.8%
419551
 
5.8%
719490
 
5.8%
918224
 
5.4%
317472
 
5.2%
1017146
 
5.1%
1115974
 
4.7%
1314444
 
4.3%
Other values (55)153537
45.3%
ValueCountFrequency (%)
02637
 
0.8%
17276
 
2.1%
212666
3.7%
317472
5.2%
419551
5.8%
ValueCountFrequency (%)
8919
 
< 0.1%
7150
< 0.1%
692
 
< 0.1%
6728
< 0.1%
6142
< 0.1%

ChuoChakukaisu1
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.672210212
Minimum0
Maximum18
Zeros124139
Zeros (%)36.7%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile5
Maximum18
Range18
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.835322284
Coefficient of variation (CV)1.097542804
Kurtosis1.68257104
Mean1.672210212
Median Absolute Deviation (MAD)1
Skewness1.222755774
Sum566197
Variance3.368407887
MonotocityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0124139
36.7%
168198
20.1%
248772
 
14.4%
341321
 
12.2%
428017
 
8.3%
515402
 
4.5%
66495
 
1.9%
73603
 
1.1%
81450
 
0.4%
9725
 
0.2%
Other values (5)470
 
0.1%
ValueCountFrequency (%)
0124139
36.7%
168198
20.1%
248772
 
14.4%
341321
 
12.2%
428017
 
8.3%
ValueCountFrequency (%)
1826
 
< 0.1%
137
 
< 0.1%
1234
 
< 0.1%
11101
 
< 0.1%
10302
0.1%

ChuoChakukaisu2
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.636406058
Minimum0
Maximum18
Zeros143149
Zeros (%)42.3%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum18
Range18
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.055952412
Coefficient of variation (CV)1.256382792
Kurtosis2.897165962
Mean1.636406058
Median Absolute Deviation (MAD)1
Skewness1.589431578
Sum554074
Variance4.22694032
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
0143149
42.3%
161084
18.0%
245486
 
13.4%
332884
 
9.7%
420840
 
6.2%
514596
 
4.3%
69273
 
2.7%
75471
 
1.6%
82791
 
0.8%
91421
 
0.4%
Other values (7)1597
 
0.5%
ValueCountFrequency (%)
0143149
42.3%
161084
18.0%
245486
 
13.4%
332884
 
9.7%
420840
 
6.2%
ValueCountFrequency (%)
188
 
< 0.1%
1538
 
< 0.1%
1445
 
< 0.1%
13114
 
< 0.1%
12361
0.1%

ChuoChakukaisu3
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.581546522
Minimum0
Maximum13
Zeros133347
Zeros (%)39.4%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum13
Range13
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.889299417
Coefficient of variation (CV)1.194589846
Kurtosis2.490329105
Mean1.581546522
Median Absolute Deviation (MAD)1
Skewness1.503533842
Sum535499
Variance3.569452286
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0133347
39.4%
170466
20.8%
250779
 
15.0%
333472
 
9.9%
421710
 
6.4%
512750
 
3.8%
67670
 
2.3%
74288
 
1.3%
81978
 
0.6%
91407
 
0.4%
Other values (4)725
 
0.2%
ValueCountFrequency (%)
0133347
39.4%
170466
20.8%
250779
 
15.0%
333472
 
9.9%
421710
 
6.4%
ValueCountFrequency (%)
1353
 
< 0.1%
12168
 
< 0.1%
11201
 
0.1%
10303
 
0.1%
91407
0.4%

ChuoChakukaisu4
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.53417978
Minimum0
Maximum19
Zeros130810
Zeros (%)38.6%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum19
Range19
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.820622714
Coefficient of variation (CV)1.186707541
Kurtosis3.705495842
Mean1.53417978
Median Absolute Deviation (MAD)1
Skewness1.608738684
Sum519461
Variance3.314667069
MonotocityNot monotonic
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0130810
38.6%
175520
22.3%
252535
15.5%
333671
 
9.9%
419983
 
5.9%
511814
 
3.5%
67594
 
2.2%
73289
 
1.0%
81822
 
0.5%
9835
 
0.2%
Other values (5)719
 
0.2%
ValueCountFrequency (%)
0130810
38.6%
175520
22.3%
252535
15.5%
333671
 
9.9%
419983
 
5.9%
ValueCountFrequency (%)
1949
 
< 0.1%
1357
 
< 0.1%
1290
 
< 0.1%
11196
0.1%
10327
0.1%

ChuoChakukaisu5
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.479526982
Minimum0
Maximum13
Zeros127972
Zeros (%)37.8%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum13
Range13
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.73247967
Coefficient of variation (CV)1.170968621
Kurtosis3.27454378
Mean1.479526982
Median Absolute Deviation (MAD)1
Skewness1.592288163
Sum500956
Variance3.001485807
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0127972
37.8%
182401
24.3%
252329
15.5%
334377
 
10.2%
419243
 
5.7%
511222
 
3.3%
65482
 
1.6%
72698
 
0.8%
81291
 
0.4%
9794
 
0.2%
Other values (4)783
 
0.2%
ValueCountFrequency (%)
0127972
37.8%
182401
24.3%
252329
15.5%
334377
 
10.2%
419243
 
5.7%
ValueCountFrequency (%)
1335
 
< 0.1%
12131
 
< 0.1%
11235
 
0.1%
10382
0.1%
9794
0.2%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

SyotaiBreederCodeSanchiNameRuikeiHonsyoHeitiRuikeiHonsyoSyogaiRuikeiFukaHeichiRuikeiFukaSyogaiRuikeiSyutokuHeichiRuikeiSyutokuSyogaiSogoChakukaisu1SogoChakukaisu2SogoChakukaisu3SogoChakukaisu4SogoChakukaisu5SogoChakukaisu6ChuoChakukaisu1ChuoChakukaisu2ChuoChakukaisu3ChuoChakukaisu4ChuoChakukaisu5
0NaN530331新冠町35010000075000033013833013
1NaN530331新冠町35010000075000033013833013
2NaN530331新冠町35010000075000033013833013
3NaN530331新冠町35010000075000033013833013
4NaN400317新冠町50070001720095000033432933432
5NaN400317新冠町50070001720095000033432933432
6NaN400317新冠町50070001720095000033432933432
7NaN400317新冠町50070001720095000033432933432
8NaN400317新冠町50070001720095000033432933432
9NaN400317新冠町50070001720095000033432933432

Last rows

SyotaiBreederCodeSanchiNameRuikeiHonsyoHeitiRuikeiHonsyoSyogaiRuikeiFukaHeichiRuikeiFukaSyogaiRuikeiSyutokuHeichiRuikeiSyutokuSyogaiSogoChakukaisu1SogoChakukaisu2SogoChakukaisu3SogoChakukaisu4SogoChakukaisu5SogoChakukaisu6ChuoChakukaisu1ChuoChakukaisu2ChuoChakukaisu3ChuoChakukaisu4ChuoChakukaisu5
338582佐賀950570熊本350000570019000030201000100
338583佐賀513174鹿児島00005000010010100000
338584NaN700014浦河町240000000001000301000
338585NaN393126千歳市00000000000100000
338586NaN100046浦河町9200000040000010020310020
338587佐賀300337新冠0000150000101021400000
338588NaN5407059130000704003200000423301242330
338589NaN500319新冠町00000000000200000
338590NaN373126安平町00000000010500000
338591NaN130040浦河町00000000000100000

Duplicate rows

Most frequent

SyotaiBreederCodeSanchiNameRuikeiHonsyoHeitiRuikeiHonsyoSyogaiRuikeiFukaHeichiRuikeiFukaSyogaiRuikeiSyutokuHeichiRuikeiSyutokuSyogaiSogoChakukaisu1SogoChakukaisu2SogoChakukaisu3SogoChakukaisu4SogoChakukaisu5SogoChakukaisu6ChuoChakukaisu1ChuoChakukaisu2ChuoChakukaisu3ChuoChakukaisu4ChuoChakukaisu5count
46川崎533423日高110000059003100003332111002116
72笠松100004様似0000850002021215000005
73笠松130055浦河0000900002042218000005
91笠松633087新ひだか0000110000400108000005
68笠松3512日高0000550002631116000004
71笠松33468日高00001050005220520000004
101笠松803029新ひだか00001400004153113000004
104笠松900011浦河0000800002241420000004
122香港0420000031200001075037010024
124香港0230000034400006547117010004